A New Test of Cluster Hypothesis Using a Scalable Similarity-Based Agglomerative Hierarchical Clustering Framework
نویسندگان
چکیده
RÉSUMÉ. L’hypothèse de cluster est l’hypothèse fondamentale de l’utilisation du clustering dans la recherche d’information. Elle indique que les documents semblables ont tendance à être pertinents pour la même requête. Des travaux passés testent intensivement cette hypothèse avec les méthodes de la classification ascendante hiérarchique (CAH). Mais leurs conclusions ne sont pas cohérentes en termes d’efficacité de la recherche. La limite principale dans ces travaux est le problème de passage à l’échelle lié a là CAH. Dans cet article, nous étendons nos travail précédent à un nouveau test de l’hypothèse de cluster en appliquant un système extensible de CAH basé sur la similarité. Principalement, la matrice de similarité cosinus est sparsifiée par des seuils pour réduire l’occupation mémoire et le temps de calcul. Nos résultats montrent que même quand la matrice est largement sparsifiée, l’efficacité de la recherche est maintenue pour toutes les méthodes, dont le complete et l’average ne dominent pas toujours les autres.
منابع مشابه
Document Retrieval using Hierarchical Agglomerative Clustering with Multi-view point Similarity Measure Based on Correlation: Performance Analysis
Clustering is one of the most interesting and important tool for research in data mining and other disciplines. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful subgroups. The effectiveness of clustering algorithms depends on the appropriateness of the similarity measure between the data in which the similarity can be computed. This pap...
متن کاملA Relative Approach to Hierarchical Clustering
This paper presents a new approach to agglomerative hierarchical clustering. Classical hierarchical clustering algorithms are based on metrics which only consider the absolute distance between two clusters, merging the pair of clusters with highest absolute similarity. We propose a relative dissimilarity measure, which considers not only the distance between a pair of clusters, but also how dis...
متن کاملDynamic Hierarchical Compact Clustering Algorithm
In this paper we introduce a general framework for hierarchical clustering that deals with both static and dynamic data sets. From this framework, different hierarchical agglomerative algorithms can be obtained, by specifying an inter-cluster similarity measure, a subgraph of the β-similarity graph, and a cover algorithm. A new clustering algorithm called Hierarchical Compact Algorithm and its ...
متن کاملD-metric Spaces in Agglomerative Clustering
Hierarchical agglomerative clustering merges the clusters basing on their distance similarity. In this paper we present a new mathematical method called D metric spaces by Indian mathematician B.C.Dhage who has submitted his thesis in 1984 at Maratwada university. We present an algorithm for hierarchical clustering using D metric concept instead of Euclidean distance which reduces the total num...
متن کاملAgglomerative Learning Algorithms for General Fuzzy Min-Max Neural Network
In this paper two agglomerative learning algorithms based on new similarity measures defined for hyperbox fuzzy sets are proposed. They are presented in a context of clustering and classification problems tackled using a general fuzzy min-max (GFMM) neural network. The proposed agglomerative schemes have shown robust behaviour in presence of noise and outliers and insensitivity to the order of ...
متن کامل